Goto

Collaborating Authors

 instructional theory


Transforming Feature Space to Interpret Machine Learning Models

Brenning, Alexander

arXiv.org Machine Learning

Interpreting complex nonlinear machine-learning models is an inherently difficult task. A common approach is the post-hoc analysis of black-box models for dataset-level interpretation (Murdoch et al. 2019) using model-agnostic techniques such as the permutation-based variable importance, and graphical displays such as partial dependence plots that visualize main effects while integrating over the remaining dimensions (Molnar, Casalicchio, and Bischl 2020). These tools are so far limited to displaying the relationship between the response and one (or sometimes two) predictor(s), while attempting to control for the influence of the other predictors. This can be rather unsatisfactory when dealing with a large number of highly correlated predictors, which are often semantically grouped. While the literature on explainable machine learning has often focused on dealing with dependencies affecting individual features, e.g. by introducing conditional diagnostics (Strobl et al. 2008; Molnar, König, Bischl, et al. 2020), no practical solutions are available yet for dealing with model interpretation in highdimensional feature spaces with strongly dependent features (Molnar, Casalicchio, and Bischl 2020; Molnar, König, Herbinger, et al. 2020). These situations routinely occur in environmental remote sensing and other geographical and ecological analyses (Landgrebe 2002; Zortea, Haertel, and Clarke 2007), which motivated the present proposal to enhance existing model interpretation tools by offering a new, transformed perspective. For example, vegetation'greenness' as a measure of photosynthetic activity is often used to classify landcover or land use from satellite imagery acquired at multiple time points throughout the growing season (Peña and Brenning 2015; Peña, Liao, and Brenning 2017). Spectral reflectances of equivalent spectral bands (the features) are usually strongly correlated within the same phenological stage since vegetation characteristics vary gradually.


Global Database Fuels Machine–Learning Model Predictions

#artificialintelligence

GeoMark's RFDbase contains raw data from every major petroleum basin in the world that …


Benanza: Automatic $\mu$Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs

Li, Cheng, Dakkak, Abdul, Xiong, Jinjun, Hwu, Wen-mei

arXiv.org Machine Learning

As Deep Learning (DL) models have been increasingly used in latency-sensitive applications, there has been a growing interest in improving their response time. An important venue for such improvement is to profile the execution of these models and characterize their performance to identify possible optimization opportunities. However, the current profiling tools lack the highly desired abilities to characterize ideal performance, identify sources of inefficiency, and quantify the benefits of potential optimizations. Such deficiencies have led to slow characterization/optimization cycles that cannot keep up with the fast pace at which new DL models are introduced. We propose Benanza, a sustainable and extensible benchmarking and analysis design that speeds up the characterization/optimization cycle of DL models on GPUs. Benanza consists of four major components: a model processor that parses models into an internal representation, a configurable benchmark generator that automatically generates micro-benchmarks given a set of models, a database of benchmark results, and an analyzer that computes the "lower-bound" latency of DL models using the benchmark data and informs optimizations of model execution. The "lower-bound" latency metric estimates the ideal model execution on a GPU system and serves as the basis for identifying optimization opportunities in frameworks or system libraries. We used Benanza to evaluate 30 ONNX models in MXNet, ONNX Runtime, and PyTorch on 7 GPUs ranging from Kepler to the latest Turing, and identified optimizations in parallel layer execution, cuDNN convolution algorithm selection, framework inefficiency, layer fusion, and using Tensor Cores.


Deep learning model from Lockheed Martin tackles satellite image analysis

#artificialintelligence

The model, Global Automated Target Recognition (GATR), runs in the cloud, using Maxar Technologies' Geospatial Big Data platform (GBDX) to access Maxar's 100 petabyte satellite imagery library and millions of curated data labels across dozens of categories that expedite the training of deep learning algorithms. Fast GPUs enable GATR to scan a large area very quickly, while deep learning methods automate object recognition and reduce the need for extensive algorithm training. The tool teaches itself what the identifying characteristics of an object area or target, for example, learning how to distinguish between a cargo plane and a military transport jet. The system then scales quickly to scan large areas, such as entire countries. GATR uses common deep learning techniques found in the commercial sector and can identify airplanes, ships,, buildings, seaports, etc. "There's more commercial satellite data than ever available today, and up until now, identifying objects has been a largely manual process," says Maria Demaree, vice president and general manager of Lockheed Martin Space Mission Solutions.


Analyzing the benefits of communication channels between deep learning models

Lacaille, Philippe

arXiv.org Machine Learning

As artificial intelligence systems spread to more diverse and larger tasks in many domains, the machine learning algorithms, and in particular the deep learning models and the databases required to train them are getting bigger themselves. Some algorithms do allow for some scaling of large computations by leveraging data parallelism. However, they often require a large amount of data to be exchanged in order to ensure the shared knowledge throughout the compute nodes is accurate. In this work, the effect of different levels of communications between deep learning models is studied, in particular how it affects performance. The first approach studied looks at decentralizing the numerous computations that are done in parallel in training procedures such as synchronous and asynchronous stochastic gradient descent. In this setting, a simplified communication that consists of exchanging low bandwidth outputs between compute nodes can be beneficial. In the following chapter, the communication protocol is slightly modified to further include training instructions. Indeed, this is studied in a simplified setup where a pre-trained model, analogous to a teacher, can customize a randomly initialized model's training procedure to accelerate learning. Finally, a communication channel where two deep learning models can exchange a purposefully crafted language is explored while allowing for different ways of optimizing that language.


Machine Learning Model Metrics

@machinelearnbot

Kangaroo Kapital is the largest credit card company in Australia. Animals across the continent use Kangaroo Kapital credit cards to make all of their daily purchases, racking up points in the company's reward system. Since Australian animals have traditionally not worn much clothing, the challenges of carrying around cash are substantial. Only having to keep track of a single credit card is a big help for your average working wallaby. But, since no clothes means no pockets, even keeping track of one credit card can be problematic.


4 Reasons Your Machine Learning Model is Wrong (and How to Fix It)

@machinelearnbot

When we build these models, we always use a set of historical data to help our machine learning algorithms learn what is the relationship between a set of input features to a predicted output. We'll show how you can evaluate these issues by assessing metrics of bias vs. variance and precision vs. recall, and present some solutions that can help when you encounter such scenarios. If we were to train a machine learning model and it learned to always predict an email as not spam (negative class), then it would be accurate 99% of the time despite never catching the positive class. Similarly, increasing the number of training examples can help in cases of high variance, helping the machine learning algorithm build a more generalizable model.


How to Train a Final Machine Learning Model

@machinelearnbot

In this post, you discovered how to train a final machine learning model for operational use. The machine learning model that we use to make predictions on new data is called the final model. There can be confusion in applied machine learning about how to train a final model. This post will clear up the confusion.


introduction-to-deep-learning-models-with-tensorflow-online-code-2

@machinelearnbot

The lessons look at the key mathematical foundations of deep learning models, giving you insight into what makes these techniques work. Created for software engineers and budding data scientists, the course requires basic familiarity with Python programming; as well as statistics concepts such as linear and logistic regression, machine learning concepts like classification, and linear algebra. Jupyter Notebook is used to write and run code. Lucas Adams is a senior level machine learning engineer at Jet.com, where he deploys TensorFlow for computer vision and natural language processing systems.


'One machine learning model to rule them all': Google open-sources tools for simpler AI ZDNet

#artificialintelligence

Google researchers have created what they call "one model to learn them all" for training AI models in different tasks using multiple types of training data. Also, models are often trained on tasks from the same "domain", such as translation tasks being trained with other translation tasks. The model it created is trained on a variety of tasks, including image recognition, translation tasks, image captioning, and speech recognition. It also includes a library of datasets and models drawn from recent papers by Google Brain researchers.